Sains Malaysiana 52(10)(2023): 2971-2983
http://doi.org/10.17576/jsm-2023-5210-18
Classifying
Severity of Unhealthy Air Pollution Events in Malaysia: A Decision Tree Model
(Mengelaskan Keparahan Kejadian Pencemaran Udara Tidak Sihat di Malaysia: Hasil Model Pokok Keputusan)
NURULKAMAL
MASSERAN1,*, RAZIK RIDZUAN MOHD
TAJUDDIN1 & MOHD TALIB LATIF2,3
1Department of Mathematical Sciences, Faculty of
Science and Technology
Universiti Kebangsaan Malaysia, 43600
UKM Bangi, Selangor, Malaysia
2Department of Earth Sciences and Environment, Faculty
of Science and Technology
Universiti Kebangsaan Malaysia, 43600
UKM Bangi, Selangor, Malaysia
3Department of Environmental Health, Faculty of Public
Health, Universitas Airlangga,
Surabaya, East Java 60115, Indonesia
Diserahkan: 16 Jun 2023/Diterima: 2 Oktober 2023
Abstract
The application of
data mining technique in dealing with real problems is popular and ubiquitous
in various knowledge domains. This study proposes the concept of severity
measures correspond to the characteristics of duration and intensity size for
evaluating unhealthy air pollution events. In parallel with that, the present
study also proposes a decision tree as a predictive model to deal with a binary
classification corresponding to extreme and non-extreme unhealthy air pollution
events, which is established based on threshold of the power-law behavior. In a
similar vein, other characteristics, such as duration and intensity size, were
also determined as important related features. A case study was conducted using
the air pollution index data of Klang, Malaysia, from
January 1st, 1997 to August 31st, 2020. The results found
that the decision tree model can provide a high degree of precision and
generalization with 100% accuracy in classifying a class for extreme and
non-extreme events for the air pollution severity in the Klang area. In addition, a duration size is the most influential feature that leads
to the occurrence of an extreme air pollution event. Thus, this study also
suggests that authorities should exercise some vigilance precautions with
respect to pollution incidents with a consecutive duration exceeding 11 hours.
Keywords: Air
pollution classification; data mining; extreme air pollution; predictive model
Abstrak
Pengaplikasian teknik perlombongan data dalam menangani masalah dunia
sebenar adalah popular dalam pelbagai domain pengetahuan. Kajian ini
mengusulkan konsep ukuran keparahan sepadan dengan ciri tempoh masa dan saiz
keamatan untuk menilai kejadian pencemaran udara yang tidak sihat. Selari
dengan itu, kajian ini juga mengusulkan kaedah pokok keputusan sebagai model
ramalan bagi kes pengelasan binari terhadap kejadian pencemaran udara tidak
sihat yang melampau dan tidak melampau yang boleh dikenal pasti berdasarkan nilai
ambang tingkah laku hukum-kuasa. Di samping itu, ciri lain iaitu tempoh masa
dan saiz keamatan, juga dikenal pasti sebagai ciri berkaitan yang penting bagi
suatu kes pencemaran udara. Dalam kajian ini, kajian kes telah dijalankan
menggunakan data indeks pencemaran udara di Klang, Malaysia, dari 1 Januari 1997 hingga 31 Ogos 2020. Hasil
kajian mendapati model pokok hasil dapat memberikan tahap ketepatan dan
pengitlakan yang tinggi dengan ketepatan 100% dalam mengelaskan kelas bagi
kejadian pencemaran melampau dan tidak melampau merujuk kepada keparahan suatu
pencemaran udara di kawasan Klang. Selain itu, saiz tempoh masa dikenal pasti
sebagai adalah ciri berpengaruh yang membawa kepada berlakunya kejadian
pencemaran udara yang melampau. Oleh itu, kajian ini juga mencadangkan bahawa
pihak berkuasa harus melaksanakan beberapa langkah berjaga-jaga jika kejadian
pencemaran udara didapati berlaku dalam tempoh berturut-turut melebihi 11 jam.
Kata kunci: Model peramal; pencemaran udara melampau; pengelasan pencemaran
udara; perlombongan data
RUJUKAN
Agathokleous, E. & Saitanis, C.J. 2020. Plant susceptibility to ozone: A tower
of Babel? Sci. Total Environ. 703:
134962.
Agathokleous, E., Feng, Z.
& Saitanis, C.J. 2022. Effects of Ozone on Forests. In Handbook
of Air Quality and Climate Change, edited by Akimoto, H. & Tanimoto, H. Singapore: Springer.
Aggarwal,
C. 2015. Data Mining. Cham: Springer.
Al-Kindi, S.G., Brook, R.D., Biswal,
S. & Rajagopalan, S. 2020. Environmental
determinants of cardiovascular disease: Lessons learned from air pollution. Nat. Rev. Cardiol. 17: 656-672.
Bakar,
M.A.A., Ariff, N.M., Bakar, S.A., Chi, G.P. & Rajendran, R. 2022. Air quality forecasting using temporal
convolutional network (TCN) deep learning method. Sains Malaysiana 51(11): 3785-3793.
Bekesiene, S., Meidute-Kavaliauskiene, I. & Vasiliauskiene,
V. 2021. Accurate prediction of concentration changes in ozone as an air
pollutant by multiple linear regression and artificial neural networks. Mathematics 9(4): 356.
Boehmke, B. &
Greenwell, B. 2020. Hands-on Machine
Learning with R. Boca Raton: Chapman & Hall/CRC.
Breiman, L. 2001. Random
Forests. Mach. Learn. 45: 5-32.
Breiman, L. 1996. Bagging
predictors. Mach. Learn. 24: 123-140.
Breiman, L. 1984. Classification and Regression Tree. Boca
Raton: Chapman & Hall/CRC.
Brønnum-Hansena, H., Bender, A.M.,
Andersen, Z.J., Sørensen, J., Bønløkke,
J.H., Boshuizen, H., Becker, T., Diderichsen,
F. & Loft, S. 2018. Assessment of impact of traffic-related air pollution
on morbidity and mortality in Copenhagen Municipality and the health gain of
reduced exposure. Environ. Int. 121(Part 1): 973-980.
Cabaneros, S.M., Calautit, J.K. & Hughes, B.R. 2019. A review of
artificial neural network models for ambient air pollution prediction. Environ. Model. Softw. 119: 285-304.
Chang,
L-Y. & Wang, H-W. 2006. Analysis of traffic injury severity: An application
of non-parametric classification tree techniques. Accid. Anal. Prev. 38(5): 1019-1027.
Chau,
T.T. & Wang, K.Y. 2020. An association between air pollution and daily most
frequently visits of eighteen outpatient diseases in an industrial city. Sci. Rep. 10: 2321.
Cohen,
S., Rokach, L. & Maimon,
O. 2007. Decision-tree instance-space decomposition with grouped gain-ratio. Inf. Sci. 177(17): 3592-3612.
Delen, D., Kuzey,
C. & Uyar, A. 2013. Measuring firm performance
using financial ratios: A decision tree approach. Expert Syst. Appl. 40(10): 3970-3983.
Department
of Environment. 1997. A Guide to Air
Pollutant Index in Malaysia (API). Kuala Lumpur: Ministry of Science,
Technology and the Environment. https://aqicn.org/images/aqi-scales/malaysia-api-guide.pdf
Emberson, L. 2020. Effects
of ozone on agriculture, forests and grasslands. Philos. Trans. Royal Soc. A. 378(2183): 20190327.
Feldman,
D. & Gross, S. 2005. Mortgage default: Classification trees analysis. J. Real Estate Finan.
Econ. 30: 369-396.
Friedman,
J.H. 2001. Greedy function approximation: A gradient boosting machine. Ann. Stat. 29(5): 1189-1232.
Gin,
O.K. 2009. Historical Dictionary of
Malaysia. Lanham: Scarecrow Press.
Haldorai, A. & Ramu, A. 2021. Canonical correlation analysis based hyper
basis feedforward neural network classification for urban sustainability. Neural Process. Lett. 53: 2385-2401.
Hodge,
V. & Austin, J. 2004. A survey of outlier detection methodologies. Artif. Intell. Rev. 22: 85-126.
Hvidtfeldt, U.A., Severi, G., Andersen, Z.J., Atkinson, R., Bauwelinck, M., Bellander, T., Boutron-Ruault, M-C., Brandt, J., Brunekreef,
B., Cesaroni, G., Chen, J., Concin,
H., Forastiere, F., van Gils, C.H., Gulliver, J., Hertel, O., Hoek, G., Hoffmann, B., de Hoogh,
K., Janssen, N., Jöckel, K.H., Jørgensen,
J.T., Katsouyanni, K., Ketzel,
M., Klompmaker, J.O., Krog,
N.H., Lang, A., Leander, K., Liu, S., Ljungman,
P.L.S., Magnusson, P.K.E., Mehta, A.J., Nagel, G., Oftedal,
B., Pershagen, G., Peter, R.S., Peters, A., Renzi,
M., Rizzuto, D., Rodopoulou, S., Samoli,
E., Schwarze, P.E., Sigsgaard,
T., Simonsen, M.K., Stafoggia,
M., Strak, M., Vienneau,
D., Weinmayr, G., Wolf, K., Raaschou-Nielsen,
O. & Fecht, D. 2021. Long-term low-level ambient
air pollution exposure and risk of lung cancer - A pooled analysis of 7
European cohorts. Environ. Int. 146:
106249.
James,
G., Witten, D., Hastie, T. & Tibshirani, R. 2013. An Introduction to Statistical Learning
with Application in R. New York: Springer.
Kamiran, F., Calders, T. & Pechenizkiy, M.
2013. Techniques for Discrimination-Free
Predictive Models. In Discrimination
and Privacy in the Information Society. Studies in Applied Philosophy,
Epistemology and Rational Ethics, vol 3, edited
by Custers, B., Calders,
T., Schermer, B. & Zarsky,
T. Berlin: Springer.
Kow, P-Y., Chang, L-C., Lin, C-Y.,
Chou, C.C-K. & Chang, F-J. 2022. Deep neural networks for spatiotemporal PM2.5 forecasts based
on atmospheric chemical transport model output and monitoring data. Environ. Pollut. 306: 119348.
Kumar,
S., Mishra, A.K. & Choudhary, B.S. 2022.
Prediction of back break in blasting using random decision trees. Eng. Comput. 38: 1185-1191.
Lantz,
B. 2019. Machine Learning with R: Expert
Techniques for Predictive Modeling. 3rd ed. Birmingham: Packt Publishing.
Lanzi, E., Dellink,
R. & Chateau, J. 2018. The sectoral and regional economic consequences of
outdoor air pollution to 2060. Energy
Econ. 71: 89-113.
Lu,
J.G. 2020. Air pollution: A systematic review of its psychological, economic,
and social effects. Curr. Opin. Psychol. 32: 52-65.
Maimon, O. & Rokach, L. 2009. Introduction
to knowledge discovery and data mining. In Data Mining and Knowledge Discovery Handbook, edited by Maimon, O. & Rokach, L.
Boston: Springer.
Maji, S., Ghosh, S. & Ahmed, S.
2018. Association of air quality with respiratory and cardiovascular morbidity
rate in Delhi, India. Int. J. Environ.
Health Res. 28(5): 471-490.
Malik,
S., Kanwal, N., Asghar,
M.N., Sadiq, M.A.A., Karamat,
I. & Fleury, M. 2019. Data driven approach for eye disease classification
with machine learning. Appl. Sci. 9:
2789.
Masseran, N. 2022a.
Power-law behaviors of the severity of unhealthy air pollution events. Nat. Hazards 112: 1749-1766.
Masseran, N. 2022b.
Multifractal characteristics on multiple pollution variables in Malaysia. Bull. Malaysian Math. Sci. Soc. 45:
325-344.
Masseran, N. 2021a.
Power-law behaviors of the duration size of unhealthy air pollution events. Stoch. Environ. Res. Risk Asses. 35:
1499-1508.
Masseran, N. 2021b.
Modeling the characteristics of unhealthy air pollution events: A copula
approach. Int. J. Environ. Res. Public
Health 18(16): 8751.
Masseran, N. 2017.
Modeling fluctuation of PM10 data with existence of volatility
effect. Environ. Eng. Sci 34(11): 816-827.
Masseran, N. & Safari,
M.A.M. 2020. Risk assessment of extreme air pollution based on partial duration
series: IDF approach. Stoch. Environ. Res. Risk Asses. 34: 545-559.
Masui,
N., Agathokleous, E., Mochizuki, T., Tani, A., Matsuura, H. & Koike, T. 2021. Ozone disrupts
the communication between plants and insects in urban and suburban areas: An
updated insight on plant volatiles. J.
For. Res. 32: 1337-1349.
McCarthy,
R.V., McCarthy, M.M., Ceccucci, W. & Halawi, L. 2019. Applying
Predictive Analytics. Cham: Springer.
Mustakim, N.A., Ul-Saufie, A.Z., Shaziayani,
W.N., Mohamad Noor, N. & Mutalib, S. 2023.
Prediction of daily air pollutants concentration and air pollutant index using
machine learning approach. Pertanika J. Sci.
& Technol. 31(1): 123-135.
Myles,
A.J., Feudale, R.N., Liu, Y., Woody, N.A. &
Brown, S.D. 2004. An introduction to decision tree modeling. J. Chemom. 18(6): 275-285.
Ndong, G.O., Villerd,
J., Cousin, I. & Therond, O. 2021. Using a
multivariate regression tree to analyze trade-offs between ecosystem services:
Application to the main cropping area in France. Sci. Total Environ. 764: 142815.
Ouyang,
X., Shao, Q., Zhu, X., He, Q., Xiang, C. & Wei, G. 2019. Environmental
regulation, economic growth and air pollution: Panel threshold analysis for
OECD countries. Sci. Total Environ. 657: 234-241.
Putra,
F.M. & Sitanggang, I.S. 2020. Classification
model of air quality in Jakarta using decision tree algorithm based on air
pollutant standard index. IOP Conf. Ser.: Earth Environ. Sci. 528:
012053.
Raileanu, L.E. & Stoffel, K. 2004. Theoretical comparison between the Gini
Index and information gain criteria. Ann.
Math. Artif. Intell. 41: 77-93.
Rizvi,
S., Rienties, B. & Khoja, S.A. 2019. The role of
demographics in online learning; A decision tree based approach. Comput. Educ. 137: 32-47.
Rokach, L. & Maimon, O. 2015. Data Mining with Decision Trees: Theory and Applications. 2nd ed.
Singapore: World Scientific Publishing.
Rokach, L. & Maimon, O. 2009. Classification
trees. In Data Mining and
Knowledge Discovery Handbook, edited by Maimon,
O. & Rokach, L. Boston: Springer.
Rokach, L. & Maimon, O. 2005. Decision
trees. In Data Mining and
Knowledge Discovery Handbook, edited by Maimon,
O. & Rokach, L. Boston: Springer.
Rokach, L. & Maimon, O. 2005. Top-down induction of decision trees
classifiers - A survey. IEEE Trans. Syst.
Man. Cybern. B Cybern. 35(4):
476-487.
Sanyal, S., Rochereau, T., Maesano, C.N.,
Com-Ruelle, L. & Annesi-Maesano,
I. 2018. Long-term effect of outdoor air pollution on mortality and morbidity:
A 12-year follow-up study for metropolitan France. Int. J. Environ. Res.
Public Health 15(11): 2487.
Sarkhosh, M., Najafpoor, A.A., Alidadi, H., Shamsara, J., Amiri, H., Andrea,
T. & Kariminejad, F. 2021. Indoor air quality
associations with sick building syndrome: An application of decision tree
technology. Build. Environ. 188:
107446.
Schapire, R.E. &
Freund, Y. 2013. Boosting: Foundations and Algorithms. Kybernetes 42(1): 164-166.
Schraufnagel, D.E., Balmes, J.R., Cowl, C.T., Matteis,
S.D., Jung, S-H., Mortimer, K., Perez-Padilla, R., Rice, M.B.,
Riojas-Rodriguez, H., Sood, A., Thurston, G.D., To,
T., Vanker, A. & Wuebbles,
D.J. 2019. Air pollution and noncommunicable diseases: A review by the Forum of International Respiratory Societies’
Environmental Committee, Part 2: Air pollution and organ systems. CHEST 155(2): 417-426.
Shaziayani, W.N., Ul-Saufie, A.Z., Mutalib, S.,
Mohamad Noor, N. & Zainordin, N.S. 2022.
Classification prediction of PM10 concentration using a tree-based
machine learning approach. Atmosphere 13: 538.
Tan,
P-G., Steinbach, M., Karpatne, A. & Kumar,
V. 2019. Introduction to Data Mining. 2
ed. Boston: Pearson Education.
Tileubai, A., Tsend, J., Oyunbileg, B-E., Luvsantseren, P., Luvsan-Ish, A., Chilhaasuren, B., Puntsagdash,
J., Chuluunbaatar, G. & Tsagaan,
B. 2023. Study of decision tree algorithms: Effects of air pollution on under
five mortality in Ulaanbaatar. BMJ Health Care Inform. 30: e100678.
Thongtip, S., Srivichai, P., Chaitiang, N.
& Tantrakarnapa, K. 2022. The influence of air
pollution on disease and related health problems in Northern Thailand. Sains Malaysiana 51(7): 1993-2002.
Wang,
C., Feng, L. & Chen, K. 2019. The impact of ambient particulate matter on
hospital outpatient visits for respiratory and circulatory system disease in an
urban Chinese population. Sci. Total
Environ. 666: 672-679.
Wang,
N., Mengersen, K., Tong, S., Kimlin,
M., Zhou, M., Wang, L., Yin, P., Xua, Z., Cheng, J.,
Zhang, Y. & Hu, W. 2019. Short-term association between ambient air
pollution and lung cancer mortality. Environ.
Res. 179(Part A): 108748.
Zalakeviciute, R., Bastidas, M., Buenaño, A. & Rybarczyk, Y.A. 2020. Traffic-based method to predict and
map urban air quality. Appl. Sci. 10:
2035.
Zhang,
Y., Zhang, R., Ma, Q., Wang, Y., Wang, Q., Huang, Z. & Huang, L. 2020. A
feature selection and multi-model fusion-based approach of predicting air
quality. ISA Trans. 100: 210-220.
Zhao,
C-N., Xu, Z., Wu, G-C., Mao, Y-M., Liu, L-N., Wu, Q., Dan, Y-L., Tao, S-S.,
Zhang, Q., Sam, N.B., Fan, Y-G., Zou, Y-F., Ye, D-Q. & Pan, H-F. 2019.
Emerging role of air pollution in autoimmune diseases. Autoimmun. Rev. 18(6): 607-614.
Zhao,
H., Zheng, Y. & Wu, X. 2018. Assessment of yield and economic losses for
wheat and rice due to ground-level O3 exposure in the Yangtze River
Delta, China. Atmos. Environ. 191:
241-248.
Zhao,
H., Zhang, Y., Qi, Q. & Zhang, H. 2021. Evaluating the impacts of
ground-level O3 on crops in China. Curr. Pollution Rep. 7: 565-578.
*Pengarang untuk surat-menyurat; email: kamalmsn@ukm.edu.my
|